Large-scale topological and dynamical properties of Internet 
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We study the large-scale topological and dynamical properties of real Internet maps at the au- 
tonomous system level, collected in a three years time interval. We find that the connectivity struc- 
ture of the Internet presents average quantities and statistical distributions settled in a well-defined 
stationary state. The large-scale properties are characterized by a scale-free topology consistent 
with previous observations. Correlation functions and clustering coefficients exhibit a remarkable 
structure due to the underlying hierarchical organization of the Internet. The study of the Internet 
time evolution shows a growth dynamics with aging features typical of recently proposed grow- 
ing network models. We compare the properties of growing network models with the present real 
Internet data analysis. 
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I. INTRODUCTION 

The Internet is a capital example of growing com- 
plex network jj], ^| interconnecting millions of comput- 
ers around the world. Growing networks exhibit a high 
degree of wiring entanglement which takes place during 
their dynamical evolution. This feature, at the heart of 
the new and interesting topological properties recently 
observed in growing network systems B], has trig- 
gered the attention of the research community to the 
study of the large-scale properties of router-level maps 
of the Internet S ^, [jj. The statistical analysis per- 
formed so far has focused on several quantities exhibiting 
non-trivial properties: wiring redundancy and cluster- 
ing, @, ^, pi , the distribution of chemical distances 
|^|, px}, and the eigenvalue spectra of the connectivity 
matrix |l0|. Noteworthy, the presence of a power-law 
connectivity distribution §, |l(J [n], ||, [u| makes the 
Internet an example of the recently identified class of 
scale- free networks jlj, ^5). This evidence implies the 
absence of any characteristic connectivity — large connec- 
tivity fluctuations — and a high heterogeneity of the net- 
work structure. 

As widely pointed out in the literature jl3[ [l7|, a 
deeper empirical understanding of the topological prop- 
erties of Internet is fundamental in the developing of re- 
alistic Internet map generators, that on their turn are 
used to test and optimize Internet protocols. In fact, the 
Internet topology has a great influence on the dynam- 
ics that data traffic carries out on top of it. Hence, a 
better understanding of the Internet structure is of pri- 
mary importance in the design of new routing JH| |r"?|j 
and searching algorithms ^9), and to protect from 
virus spreading and node failures 0, [23]. In 
this perspective, the direct measurement and statistical 
characterization of real Internet maps are of crucial im- 
portance in the identification of the basic mechanisms 
that rule the Internet structure and dynamics. 



In this work, we shall consider the evolution of real In- 
ternet maps from 1997 to 2000, collected by the National 
Laboratory for Applied Network Research (NLANR) f| , 
in order to study the underlying dynamical processes 
leading to the Internet structure and topology. We pro- 
vide a statistical analysis of several average properties. 
In particular, we consider the average connectivity, clus- 
tering coefficient, chemical distance, and betweenness. 
These quantities will provide a preliminary test of the sta- 
tionarity of the network. The scale-free nature of the In- 
ternet has been pointed out by inspecting the connectiv- 
ity probability distribution, and it implies that the fluctu- 
ations around the average connectivity are not bounded. 
In order to provide a full characterization of the scale- 
free properties of the Internet, we analyze the connectiv- 
ity and betweenness probability distributions for differ- 
ent time snapshot of the Internet maps. We observe that 
these distributions exhibit an algebraic behavior and are 
characterized by scaling exponents which are stationary 
in time. The chemical distance between pairs of nodes, 
on the other hand, appears to be sharply peaked around 
its average value, providing a striking evidence for the 
presence of well-defined small- world properties A 
more detailed picture of the Internet can be achieved by 
studying higher order correlation functions of the net- 
work. In this sense, we show that the Internet hierarchi- 
cal structure is reflected in non-trivial scale- free between- 
ness and connectivity correlation functions. Finally, we 
study several quantities related to the growth dynamics 
of the network. The analysis points out the presence of 
two distinct wiring processes: the first concerns newly 
added nodes, while the second is related to already exist- 
ing nodes increasing their interconnections. Wc confirm 
that newly added nodes establish new links with the lin- 
ear preferential attachment rule often used in modeling 
growing networks flif . In addition, a study of the connec- 
tivity evolution of a single node shows a rich dynamical 
behavior with aging properties. The present study could 
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provide some hints for a more realistic modeling of the 
Internet evolution, and with this purpose in mind we pro- 
vide a discussion of some of the existing growing network 
models in the light of our findings. A short account of 
these results appeared in Ref . pq] . 

The paper is organized as follows. In Section II we 
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309 


1990 


3410 
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129 


887 


1713 


N dead {k > 10) 





14 


68 



describe the Internet maps used in our study. Sec. Ill is 
devoted to the study of average quantities as a function of 
time. In Sec. IV we provide the analysis of the statistical 
distributions characterizing the Internet topology. We 
obtain evidence for the scale-free nature of this network 
as well as for the stationarity in time of this property. In 
Sec. we characterize the hierarchical structure of the 
Internet by the statistical analysis of the betweenness 
and connectivity correlation functions. Sec. VI reports 
the study of dynamical properties such as the preferential 
attachment and the evolution of the average connectiv- 
ity of newly added nodes. These properties, which show 
aging features, are the basis for the developing of Inter- 
net dynamical models. Sec. VII is devoted to a detailed 
discussion of some Internet models as compared with the 
presented real data analysis. Finally, in Sec. VIII we 
draw our conclusions and perspectives. 



II. MAPPING THE INTERNET 

Several Internet mapping projects are currently de- 
voted to obtain high-quality router-level maps of the In- 
ternet. In most cases, the map is constructed by using 
a hop-limited probe (such as the UNIX traceroute tool) 
from a single location in the network. In this case the re- 
sult is a "directed" , map as seen from a specific location 
on the Internet ffi]. This approach does not correspond 
to a complete map of the Internet because cross-links 
and other technical problems (such as multiple Internet 
provider aliases) are not considered. Heuristic methods 
to take into account these problems have been proposed 
(see for instance Ref. ^6|). However, it is not clear 
their reliability and the corresponding completeness of 
the maps constructed this way. 

A different representation of Internet is obtained by 
mapping the autonomous systems (AS) topology. The 
Internet can be considered as a collection of subnetworks 
that are connected together. Within each subnetwork the 
information is routed using an internal algorithm that 
may differ from one subnetwork to another. Thus, each 
subnet is an independent unit of the Internet and it is 
often referred as an AS. These AS communicate between 
them using a specific routing algorithm, the Border Gate- 
way Protocol. Each AS number approximately maps to 
an Internet Service Provider (ISP) and their links are 
inter-ISP connections. In this case it is possible to col- 
lect data from several probing stations to obtain complete 
interconnectivity maps (see Refs. (SJ |6| for a technical de- 
scription of these projects). In particular, the NLANR 
project is collecting data since Nov. 1997, and it provides 
topological as well as dynamical information on a con- 



TABLE I: Total number of new (iVnew) and deleted (N dea ,d) 
nodes in the years 1997, 1998, and 1999. We also report the 
number of deleted nodes with connectivity k > 10. 



sistcnt subset of the Internet. The first Nov. 1997 map 
contains 3180 AS, and it has grown in time until the Dec. 
1999 measurement, consisting of 6374 AS. In the follow- 
ing we will consider the graph whose nodes represent the 
AS and whose links represent the adjacencies (intercon- 
nections) between AS. In particular we will focus in three 
different snapshots corresponding to November 8th 1997, 
1998, and 1999, that will be referenced as AS97, AS98, 
and AS99, respectively. 

The NLANR connectivity maps are collected with a 
resolution of one day and are changing from day to day. 
These changes are due to the addition (birth) and dele- 
tion (death) of nodes and links, but also to the flickering 
of connections, so that a node may appear to be isolated 
(not mapped) from time to time. A simple test, how- 
ever, shows that flickering is appreciable just in nodes 
with low connectivity. We compute the ratio r between 
the number of days in which a node is observed in the 
NLANR maps and the total number of days after the 
first appearance of the node, averaged over all nodes in 
the maps. The analysis reveals that r ~ 1 and r > 0.65 
for nodes with connectivity k > 10, and k < 10, re- 
spectively. Hence, nodes with k < 10 have fluctuations 
that must be taken into account. In order to shed light 
on this point, we inspect the incidence of death events 
with respect to the creation of new nodes. We consider 
a death event only if a node is not observed in the map 
during a one year time interval. In Table || we show the 
total number of death events in a year, for 1997, 1998, 
and 1999, in comparison with the total number of new 
nodes created. It can be seen that the AS's birth rate ap- 
pears to be larger by a factor of two than the death rate. 
More interestingly, if we restrict the analysis to nodes 
with connectivity k > 10, the death rate is reduced to a 
few percent of the birth rate. This clearly indicates that 
only poorly connected nodes have an appreciable prob- 
ability to disappear. This fact is easily understandable 
in terms of the market competition among ISP's, where 
small newcomers are the ones which more likely go out 
of business. 



III. AVERAGE PROPERTIES AND 
STATIONARITY 

The growth rate of AS maps reveals that the Internet 
is a rapidly evolving network. Thus, it is extremely im- 
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portant to know whether or not it has reached a station- 
ary state whose average properties are time- independent. 
This will imply that, despite the continuous increase of 
nodes and connections in the system, the network's topo- 
logical properties are not appreciably changing in time. 
As a first step, we have analyzed the behavior in time 
of several average magnitudes: the average connectivity 
(k), the clustering coefficient (c), the average chemical 
distance (d), and the average betweenness (b). 

The connectivity ki of a node i is defined as the number 
of connections of this node with other nodes in the net- 
work, and (k) is the average of ki over all nodes in the net- 
work. Since each connection contributes to the connec- 
tivity of two nodes, we have that (k) = 2E/N, where E is 
the total number of connections and N is the number of 
nodes. Both E and N are increasing with time but their 
ratio remains almost constant. The average connectivity 
for the 1997, 1998, and 1999 years (averaged over all the 
AS maps available for that year) is shown in Table [0]. In 
average each node has three to four connections, which is 
a small number compared with that of a fully connected 
network of the same size ((k) = N — 1 ~ 10 3 ). The 
average connectivity gives information about the num- 
ber of connections of any node but not about the overall 
structure of these connections. More information can 
be obtained using the clustering coefficient introduced in 
Ref. fjij . The number of neighbors of a node i is given 
by its connectivity fcj. On their turn, these neighbors can 
be connected among them forming a triangle with node 
i. The clustering coefficient Cj is then defined as the ratio 
between the number of connections among the ki neigh- 
bors of a given node i and its maximum possible value, 
ki(ki — l)/2. The average clustering coefficient (c) is the 
average of c, over all nodes in the network. The cluster- 
ing coefficient thus provides a measure of how well locally 
interconnected are the neighbors of any node. The maxi- 
mum value of (c) is 1, corresponding to a fully connected 
network. For random graphs p7| , which are constructed 
by connecting nodes at random with a fixed probability 
p, the clustering coefficient decreases with the network 
size N as (c) rand = (k) /N . On the contrary, it remains 
constant for regular lattices. The average clustering co- 
efficient obtained for the 1997, 1998, and 1999 years is 
shown in Table ||. As it can be seen, the clustering co- 
efficient of the AS maps increases slowly with increasing 
N and takes values (c) ~ 0.2, two orders of magnitudes 
larger than (c) rand ~ 10~ 3 , corresponding to a random 
graph with the same number of nodes. Therefore, the AS 
maps are far from being a random graph, a feature that 
can be naively understood using the following argument: 
In AS maps the connections among nodes are equivalent, 
but they are actually characterized by a real space length 
corresponding to the actual length of the physical con- 
nection between AS. The larger is this length, the higher 
the costs of installation and maintenance of the line, fa- 
voring therefore the connection between nearby nodes. It 
is thus likely that nodes within the same geographical re- 
gion will have a large number of connection among them, 



increasing in this way the local clustering coefficient. 

With this reasoning one might be lead to the con- 
clusion that the Internet topology is close to a regular 
two-dimensional lattice. The analysis of the chemical 
distances between nodes, however, reveals that this is 
not the case. Two nodes i and j are said to be con- 
nected if one can go from node i to j following the con- 
nections in the network. The path from i to j may be 
not unique and its distance is given by the number of 
nodes visited. The average chemical distance (d) is de- 
fined as the shortest path distance between two nodes 
i and j, dij, averaged over every pair of nodes in the 
network. For regular lattices, (d) D ~ A 1 /- , where D 
is the spatial dimension. Hence, if the Internet could be 
mapped into a two-dimensional lattice, we should observe 
(dJiwj ~ A 1 / 2 « 60. However, as it can be seen from Ta- 
blejn], for the AS maps (d) ~ 3.6 <C {d) D=2 - The Internet 
strikingly exhibits what is known as the "small-world" ef- 
fect [[24j, |28| : in average one can go from one node to any 
other in the system passing through a very small number 
of intermediate nodes. This necessarily implies that be- 
sides the short local connections which contribute to the 
large clustering coefficient, there are some hubs and back- 
bones which connect different regional networks, strongly 
decreasing the average chemical distance. Another mea- 
sure of this feature is given by the number of minimal 
paths that pass by each node. To go from one node in 
the network to another following the shortest path, a se- 
quence of nodes is visited. If we do this for every pair 
of nodes in the network, there will be a certain number 
of key nodes that will be visited more often than others. 
Such nodes will be of great importance for the transmis- 
sion of information along the network. This fact can be 
quantitatively measured by means of the betweenness bi, 
defined by the total number of shortest paths between 
any two nodes in the network that pass thorough the 
node i. The average betweenness (6) is the average value 
of bi over all nodes in the network. The betweenness 
has been introduced in the analysis of social network in 
Ref. [^9| and more recently it has been studied in scale- 
free networks, with the name of load pQ|. Moreover, an 



Year 


1997 


1998 


1999 


N 


3112 


3834 


5287 


E 


5450 


6990 


10100 


(k) 


3.5(1) 


3.6(1) 


3.8(1) 


(c) 


0.18(3) 


0.21(3) 


0.24(3) 


(d) 


3.8(1) 


3.8(1) 


3.7(1) 


(b) /N 


2.4(1) 


2.3(1) 


2.2(1) 



TABLE II: Average properties of the Internet for three differ- 
ent years. N: number of nodes; E: number of connections; 
(k): average connectivity; (c): average clustering coefficient; 
(d) average chemical distance; (b) average betweenness. Fig- 
ures in parenthesis indicate the statistical uncertainty from 
averaging the values of the corresponding months in each year. 
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algorithm to compute the betweenness has been given in 
Ref. pg| . For a star network the betweenness takes its 
maximum value N(N — l)/2 at the central node and its 
minimum value N — 1 at the vertices of the star. The 
average betweenness of the three AS maps analyzed here 
is shown in Table y. Its value is between 2N and 3iV, 
which is quite small in comparison with its maximum 
possible value N(N - 1) ~ 10 7 . 

The present analysis makes clear that the Internet is 
not dominated by a very few highly connected nodes sim- 
ilarly to star-shaped architectures. As well, simple aver- 
age measurements rule out the possibility of a random 
graph structure or a regular grid architecture. This evi- 
dence hints towards a peculiar topology that will be fully 
identified by looking at the detailed probability distri- 
butions of several quantities. Finally, it is important to 
stress that despite the network size is more than doubled 
in the three years period considered, the average quan- 
tities suffer variations of a few percent (see Table ||). 
This points out that the system seems to have reached 
a fairly well-defined stationary state, as we shall confirm 
in the next Section by analyzing the detailed statistical 
properties of the Internet. 



IV. FLUCTUATIONS AND SCALE-FREE 
PROPERTIES 

In order to get a deeper understanding of the network 
topology we look at the probability distributions pk{k) 
and Pb(b) that any given node in the network has a con- 
nectivity k and a betweenness b, respectively. The study 
of these probability distributions will allow us to probe 
the extent of fluctuations and heterogeneity present in 
the network. We shall see that the strong scale-free na- 
ture of the Internet, previously noted in Refs. [L2| , 
results in power-law distributions with diverging fluctu- 
ations for these quantities. The analysis of the maps 
reveals, in fact, an algebraic decay for the connectivity 
distribution, 
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Pk(k) ~ k 7 , 



(1) 



extending over three orders of magnitude. In Fig. |l| we 
report the integrated connectivity distribution 



P k (k) 



Pk(k')dk' 



(2) 



corresponding to the AS97, AS98, and AS99 maps. The 
integrated distribution, which expresses the probability 
that a node has connectivity larger than or equal to k, 
scales as 



Pk{k) ~ fc 1 - 



(3) 



and it has the advantage of being considerably less noisy 
that the original distribution. In all maps we find a clear 
power-law behavior with slope close to —1.2 (see Fig. [l]), 
yielding a connectivity exponent 7 = 2.2 ±0.1. The 
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FIG. 1: Integrated connectivity distribution for the AS97, 
AS98, and AS99 maps. The power-law behavior is charac- 
terized by a slope —1.2, which yields a connectivity exponent 
7 = 2.2 ±0.1. 



distribution cut-off is fixed by the maximum connectivity 
of the system and is related to the overall size of the 
Internet map. We see that for more recent maps the cut- 
off is slightly increasing, as expected due to the Internet 
growth. On the other hand, the connectivity exponent 7 
seems to be independent of time and in good agreement 
with previous measurements fiof . 

The betweenness distribution pb(b) (i.e. the probabil- 
ity that any given node is passed over by b shortest paths) 
shows also scale-free properties, with a a power-law dis- 
tribution 

Pb (b) ~ b- s (4) 

extending over three decades. As shown in Fig. ||(a), 
the integrated betweenness distribution measured in the 
AS maps is evidently stable in the three years period 
analyzed and follows a power-law decay 



Pb(b) 



Pb(b')db' 



as 



(5) 



where the betweenness exponent is 5 = 2.1 ± 0.2. The 
connectivity and betweenness exponents can be simply 
related if one assumes that the number of shortest paths 
bk passing over a node of connectivity k follows the scal- 
ing form 

b k ~ k?. (6) 

By inserting the latter relation in the integrated between- 
ness distribution Eq. (JsJ) we obtain 



P k (k)^k^- 5 \ 



(7) 



Since we have that Pk(k) ~ k 1 7 , we obtain the scaling 
relation 



= 



7-1 
6-1' 



(8) 
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b/N k 

FIG. 2: a) Integrated betweenness distribution for the AS97, AS98, and AS99 maps. The power-law behavior is characterized 
by a slope —1.1, which yields a betweenness exponent S = 2.1 ±0.2. b) Betweenness bk as a function of the node's connectivity 
k. The full line corresponds to the predicted behavior bk ~ k. Errors bars take into account statistical fluctuations over different 
nodes with the same connectivity. 



The measured 7 and <5 have approximately the same value 
for the AS maps data and we expect to recover (3 « 
1.0. This is corroborated in Fig. ||(b), where we report 
the direct measurement of the average betweenness of a 
node as a function of its connectivity fc. It is also worth 
remarking that it has been recently argued |}0) that the 
betweenness distribution of scale-free networks with 2 < 
7 < 3 is an universal quantity not depending on 7. From 
a numerical study of two scale-free network models ppj , 
it was found that the betweenness distribution follows a 
universal power-law decay with an exponent 8 « 2.2, in 
fair agreement with our findings. 

Another quantity of interest is the probability distri- 
bution of the clustering coefficient of the nodes. In our 
analysis we don't find definitive evidence for a power-law 
behavior of this distribution. However, still useful infor- 
mation can be gathered from studying the clustering co- 
efficient Cfe as a function of the node connectivity. In this 
case the local clustering coefficient of each node Ci is av- 
eraged over all nodes with the same connectivity k. The 
plots for the AS97, AS98 and AS99 maps are shown in 
Fig. [| Also in this case, measurements yield a power-law 
behavior c k ~ k~ w with 10 — 0.75 ± 0.03, extending over 
three orders of magnitudes. This implies that nodes with 
a small number of connections have larger local clustering 
coefficients than those with a large connectivity. This be- 
havior is consistent with the picture previously described 
in Sec. Ill of highly clustered regional networks sparsely 
interconnected by national backbones and international 
connections. The regional clusters of AS are probably 
formed by a large number of nodes with small connec- 
tivity but large clustering coefficients. Moreover, they 
also should contain nodes with large connectivities that 
are connected with the other regional clusters. These 
large connectivity nodes will be on their turn connected 



to nodes in different clusters which are not interconnected 
and, therefore, will have a small local clustering coef- 
ficient. This picture also shows the existence of some 
hierarchy in the network that will become more evident 
in the next Section. 

A different behavior is followed by the chemical dis- 
tance d between two nodes, which does not show singu- 
lar fluctuations from one pair of nodes to another. This 
can be shown by means of the probability distribution 
Pd{d) of chemical distances d between pairs of nodes, re- 
ported in Fig. 0(a). This distribution is characterized by 
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FIG. 3: Clustering coefficient Ck as a function of the connec- 
tivity k for the AS97, AS98, and AS99 maps. The best fitting 
power-law behavior is characterized by a slope —0.75. Errors 
bars take into account statistical fluctuations over different 
nodes with the same connectivity. 
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a sharp peak around its average value and its shape re- 
mains essentially unchanged from the AS97 to the AS99 
maps. Associated to the chemical distance distribution 
we have the hop plot introduced in Ref. j[o|. The hop 
plot is defined as the average fraction of nodes M(d)/N 
within a chemical distance less than or equal to d from 
a given node. At d = we find the starting node and, 
therefore, M(0) = 1. At d — 1 we found the starting 
node plus its neighbors and thus M(l) = (k) + 1. If the 
network is made by a single cluster, for d = du, where 
c£m is the maximum chemical distance, M(cIm) = N. For 
regular D-dimensional lattices, M(d) ~ d D , and in this 
case M can be interpreted as the mass. The hop plot is 
related to the distribution of chemical distances through 
the following relation: 

« = X>W. (9) 

* d'=0 

The hop plots for the AS97, AS98 and AS99 maps are 
shown in Fig. ^(b). In this case the chemical distance 
barely spans a decade {d,M = 11). Most importantly, 
M(d) practically reaches its maximum value N at d = 5. 
Hence, the chemical distance does not show strong fluc- 
tuations, as already noticed from the chemical distance 
distribution. In Ref. it was argued that the increase 
of M(d) for small d follows a power-law distribution. This 
observation is not consistent with the present data, that 
yield a very abrupt increase taking place in a very narrow 
range. 

Finally, it is important to stress again that all the 
measured distributions are characterized by scaling ex- 
ponents or behaviors which are not changing in time. 
This implies that the statistical properties characterizing 
the Internet are time independent, providing a further 
test to the network stationarity; i.e. the Internet is self- 
organized in a stationary state characterized by scale-free 



fluctuations. 



V. HIERARCHY AND CORRELATIONS 

Due to installation costs, the Internet has been de- 
signed with a hierarchical structure. The primary known 
structural difference between Internet nodes is the dis- 
tinction between stub and transit domains. Nodes in stub 
domains have links that go only through the domain it- 
self. Stub domains, on the other hand, are connected via 
a gateway node to transit domains that, on the contrary, 
are fairly well interconnected via many paths. This hi- 
erarchy can be schematically divided into international 
connections, national backbones, regional networks, and 
local area networks. Nodes providing access to interna- 
tional connections or national backbones are of course on 
top level of this hierarchy, since they make possible the 
communication between regional and local area networks. 
Moreover, in this way, a small average chemical distance 
can be achieved with a small average connectivity. 

Very likely the hierarchical structure will introduce 
some correlations in the network topology. We can ex- 
plore the hierarchical structure of the Internet by moans 
of the conditional probability p c {k'\k) that a link belong- 
ing to node with connectivity k points to a node with 
connectivity k' . If this conditional probability is inde- 
pendent of k, we are in presence of a topology without 
any correlation among the nodes' connectivity. In this 
case, p c (k'\k) — p c (k') ~ k'pk(k'), in view of the fact 
that any link points to nodes with a probability propor- 
tional to their connectivity. On the contrary, the explicit 
dependence on A; is a signature of non-trivial correlations 
among the nodes' connectivity, and the presence of a hi- 
erarchical structure in the network topology. A direct 
measurement of the p c {k'\k) function is a rather com- 
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FIG. 5: a) Average connectivity (k nn ) of the nearest neighbors of a node as a function of the connectivity k for the same maps. 
The full line has a slope —0.5. b) Average betweenness (b nn ) of the nearest neighbors of a node as a function of its betweenness 
b for the AS97, AS98, and AS99 maps. The full line has a slope -0.4. 



plex task due to large statistical fluctuations. More clear 
indications can be extracted by studying the quantity 



(10) 



i.e. the nearest neighbors average connectivity of nodes 
with connectivity k. In Fig. ||(a) we show the results 
obtained for the AS97, AS98, and AS99 maps, that again 
exhibit a clear power-law dependence on the connectivity 
degree, 



{ k v 



(11) 



with an exponent V}. — 0.5 ±0.1. This observation clearly 
implies that the connectivity correlation function has a 
marked dependence upon k, suggesting non-trivial cor- 
relation properties for the Internet. In practice, this re- 
sult indicates that highly connected nodes are more likely 
pointing to less connected nodes, emphasizing the pres- 
ence of a hierarchy in which smaller providers connect to 
larger ones and so on, climbing different levels of connec- 
tivity. 

Similarly, it is expected that nodes with high between- 
ness (that is, carrying a heavy load of transit), and con- 
sequently a large connectivity, will be connected to nodes 
with smaller betweenness, less load and, therefore, small 
connectivity. A simple way to measure this effect is to 
compute the average betweenness (b nn ) of the neighbors 
of the nodes with a given betweenness b. The plot of 
(b nn ) for the AS97, AS98, and AS99 maps, represented 
in Fig. fflh), shows that the average neighbor between- 
ness exhibits a clear power-law dependence on the node 
betweenness b, 



with an exponent i/f, = 0.4±0.1, evidencing that the more 
loaded nodes (backbones) are more frequently connected 
with less loaded nodes (local networks). 

These hierarchical properties of the Internet are likely 
driven by several additional factors such as the space lo- 
cality, economical resources and the market demand. An 
attempt to relate and study some of these aspects can be 
found in Ref . |l3| , where the geographical distributio n of 
population and Internet access are studied. In Sec. VII 



(12) 



we shall compare a few of the existing models for the 
generation of scale-free networks with our data analysis, 
in an attempt to identify some relevant features in the 
Internet modeling. 



VI. DYNAMICS AND GROWTH 

In order to inspect the Internet dynamics, we focus our 
attention on the addition of new nodes and links into the 
maps. In the three-years range considered, we keep track 
of the number of links £ new appearing between a newly 
introduced node and an already existing node. We also 
monitor the rate of appearance of links £ a id between al- 
ready existing nodes. In Table [II we can observe that the 
creation of new links is governed by these two processes 
at the same time. Specifically, the largest contribution to 
the growth is given by the appearance of links between 
already existing nodes. This clearly points out that the 
Internet growth is strongly driven by the need of redun- 
dancy in the wiring and an increased need of available 
bandwidth for data transmission. 

A customarily measured quantity in the case of grow- 
ing networks is the average connectivity (kj(t)) of new 
nodes as a function of their age t. In Refs. [jlR [HJ p^] 
it is shown that (ki(t)) is a scaling function of both t 
and the absolute time of birth of the node to- We thus 
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FIG. 6: Average connectivity of nodes borne within a small 
time window Ato, after a time t elapsed since their appear- 
ance. Time t is measured in days. As a comparison we report 
the lines corresponding to t° 1 and t 0,5 . 



FIG. 7: Integrated frequency of links emanating from new 
and existing nodes that attach to nodes with connectivity k. 
The full line corresponds to a slope —0.2, which yields an 
exponent a ~ 1.0. The flat tails are originated from the poor 
statistics at very high k values. 



consider the total number of nodes born within an small 
observation window Ato, such that to — const, with re- 
spect to the absolute time scale that is the Internet life- 
time. For these nodes, we measure the average connec- 
tivity as a function of the time t elapsed since their birth. 
The data for two different time windows are reported in 
Fig. |^, where it is possible to distinguish two different 
dynamical regimes: At early times, the connectivity is 
nearly constant with a very slow increase ((ki(t)) ~ t 01 ). 
Later on, the behavior approaches a power-law growth 
(ki(t)) ~ t 5 . While exponent estimates are affected by 
noise and limited time window effects, the crossover be- 
tween two distinct dynamical regimes is compatible with 
the general aging form obtained in the context of growing 
networks in Ref. 

A very important issue in the modeling of growing net- 
works concerns the understanding of the growth mecha- 
nisms at the origin of the developing of new links. As 
we shall see more in detail in the next Section, the ba- 
sic ingredients in the modeling of scale-free growing net- 
works is the preferential attachment hypothesis |L4j. In 
general, all growing network algorithms define models in 
which the rate II(fc) with which a node with k connec- 
tions receives new links is proportional to k a (see Rcf. [TlI 



Year 


1997 


1998 


1999 


p 


183(9) 


170(8) 


231(11) 




546(35) 


350(9) 


450(29) 


-^new/-^old 


0.34(2) 


0.48(2) 


0.53(3) 



TABLE III: Monthly rate of new links connecting existing 
nodes to new (£ n ew) and old (£ id) nodes. 



and Sec. |VI|). The inspection of the exact value of a in 
real networks is an important issue since the connecti vity 
properties strongly depend on this exponent [32], [L3j. 
Here we use a simple recipe that allows to extract the 
value of a by studying the appearance of new links. We 
focus on links emanating from newly appeared nodes in 
different time windows ranging from one to three years. 
We consider the frequency /j,(k) of links that connect 
to nodes with connectivity k. By using the preferen- 
tial attachment hypothesis, this effective probability is 
fi(k) ~ k a pk(k). Since we know that Pk(k) ~ fc~ 7 , we 
expect to find a power-law behavior /u(fc) ~ /c Q ~ 7 for the 
frequency. In Fig. [t] we report the obtained results for 
the integrated frequency /i cum (fc) = / fc °° fj,(k')dk', which 
shows a behavior compatible with an algebraic depen- 
dence /Lt(fc) ~ k~ 12 . By using the independently obtained 
value 7 = 2.2 we find a preferential attachment exponent 
a ~ 1.0, in good agreement with the result obtained with 
a different analysis in Ref. |3j|. We performed a similar 
analysis also for links emanated by existing nodes, re- 
covering the same form of preferential attachment (see 
Fig. [?]). The present analysis confirms the validity of 
the preferential attachment hypothesis, but leaves open 
the question of the interplay with several other factors, 
such as the nodes' hierarchy, space locality, and resource 
constraints. 



VII. MODELING THE INTERNET 

In the previous Section we have presented a thorough 
analysis of the AS maps topology. Apart from providing 
useful empirical data to understand the behavior of the 
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Internet, our analysis is of great relevance in order to test 
the validity of models of the Internet topology. The In- 
ternet topology has a great influence on the information 
traffic carried on top of it, including routing algorithms 
[16, 0, searching algorithms |is| , & v irus spreading 
[20 1, and resilience to node failure |21|, |22|, Thus, de- 
signing network models which accurately reproduce the 
Internet topology is of capital importance to carry out 
simulations on top of these networks. 

Early works considered the Erdos-Renyi [ |34| m odel or 
hierarchical networks as models of Internet |35|. How- 
ever, they yield connectivity distributions with a fast 
(exponential) decay for large connectivities, in disagree- 
ment with the power-law decay observed in real data. 
Only recently the Internet modeling benefited of the ma- 
jor advance provided in the field of growing networks 
by the introduction of the Barabasi- Albert (BA) model 
1 4. |l5[ Bq| , which is related to 1955 Simon's model 
[37, |3q 7 pW - The main ingredients of this model are the 
growing nature of the network and a preferential attach- 
ment rule, in which the probability of establishing new 
links toward a given node grows linearly with its connec- 
tivity. The BA model is constructed using the following 
algorithm p4[ : We start from a small number too of dis- 
connected nodes; every time step a new node is added, 
with to links that are connected to an old node i with 
probability 



10" 



(13) 



where ki is the connectivity of the i-th node. After iter- 
ating this procedure N times, we obtain a network with 
a connectivity distribution Pk(k) ~ fc~ 3 and average con- 
nectivity (k) — 2m. In this model, heavily connected 
nodes will increase their connectivity at a larger rate than 
less connected nodes, a phenomenon that is known as the 
"rich-get- richer" effect jl4|. It is worth remarking, how- 
ever, that more general studies ||, [u], |3^[ have revealed 
that nonlinear attachment rates of the form n(fc) ~ k a 
with a^l have as an outcome connectivity distributions 
that depart form the power-law behavior. The BA model 
has been successively modified with the introduction of 
several ingredients in order to account for connectivity 
distribution with 2 < 7 < 3 ||l], ||^, [h)|], local geograph- 
ical factors f4H| , wiring among existing nodes p^ |, and 
age effects [|43|| . 

In the previous Section we have analyzed different mea- 
sures that characterize the structure of AS maps. Since 
several models are able to reproduce the right power law 
behavior for the connectivity distribution, the analysis 
obtained in the previous sections can provide the effective 
tools to scrutinize the different models at a deeper level. 
In particular, we perform a data comparison for three 
different models that generate networks with power-law 
connectivity distributions. First we have considered a 
random graph constructed with a power-law connectivity 
distribution, using the Molloy and Reed (MR) algorithm 
p4|, E5| . Secondly, we have studied two variations of the 
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FIG. 8: Integrated connectivity distribution for the MR, 
GBA, and fitness models, compared with the result from the 
AS98 map. The full line has slope -1.2. 



BA model, that yield connectivity exponents compati- 
ble with the one measured in Internet: the generalized 
Barabasi- Albert (GBA) model pfj|| , which includes the 
pos sibility of connection rewiring, and the fitness model 
P6| , that implements a weighting of the nodes in the 
preferential attachment probability. 

The models are defined as follows: 

MR model: In the construction of this model ^, |44|, 
[45| , we start assigning to each node i in a set of N 
nodes a random connectivity ki drawn from the prob- 
ability distribution pk(k) ~ fc~ 7 , with m < ki < N, 
and imposing the constraint that the sum ^ f ki must 
be even. The graph is completed by randomly connect- 
ing the nodes with J^. ki/2 links, respecting the assigned 
connectivities. The results presented here are obtained 
using to = 1 and a connectivity exponent 7 = 2.2, equal 
to that found in the AS maps. Clearly this construction 
algorithm does not take into account any correlations or 
dynamical feature of the Internet and it can be consid- 
ered as a first order approximations that focuses only in 
the connectivity properties. 

GBA model: It is defined by starting with Too nodes 
connected in a ring jfOj: At each time step one of the 
following operations is performed: 

(i) With probability q we rewire to links. For each of 
them, we randomly select a node i and a link Zy 
connected to it. This link is removed and replaced 
by a new link l^j connecting the node j to a new 
node i' selected with probability H(kii) where 



TlGBAih) 



1 



(14) 



(ii) With probability p we add to new links. For each 
of them, one end of the link is selected at random, 
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while the other is selected with probability as in 
Eq. @. 

(iii) With probability l-g-pwe add a new node with 
m links that are connected to nodes already present 
with probability as in Eq. jl^). 

The preferential attachment probability Eq. ( |l4| ) leads 
to a power-law distributed connectivity, whose exponent 
depends on the parameters q and p. In the particular 
case p — 0, the connectivity exponent is given by 



7 = 1 



(l-g)(2m + l) 



(15) 



Hence, changing the value of to and q we can obtain the 
desired connectivity exponent 7. In the present simula- 
tions we use the values to = 2 and q = 13/25, that yield 
the exponent 7 = 2.2. The GBA model embeds both the 
rich-get-richer paradigm and the growing nature of the 
Internet; however, it does not take into account any pos- 
sible difference or hierarchies in newly appearing nodes. 

Fitness model: This network model introduces an 
external competence among nodes to gain links, that is 
controlled by a random (fixed) fitness parameter r\i that 
is assigned to each node i from a probability distribution 
p(r)). In this case, we also start with too nodes connected 
in a ring and at each time step we add a new node %' with 
771 links that are connected to nodes already present on 
the network with probability 



II fitness y^i 



y., 'i.i-.i 



(16) 



The newly added node is assigned a fitness 77^. The 
results presented here are obtained using to = 2 and 
a probability p(rf) uniformly distributed in the interval 
[0,1], which yields a connectivity distribution Pk{k) ~ 
fe-T/lnjfc with 7 w 2.26 @]. The fitness model adds 
to the growing dynamics with preferential attachment 
a stochastic parameter, the fitness, that embeds all the 
properties, other than the connectivity, that may influ- 
ence the probability of gaining new links. 

We have performed simulations of these three models 
using the parameters mentioned above and using sizes 





MR 


GBA 


Fitness 


1998 


(fc) 


4.8(1) 


5.4(1) 


4.00(1) 


3.6(1) 


(c) 


0.16(1) 


0.12(1) 


0.02(1) 


0.21(3) 


(d) 


3.1(1) 


1.8(1) 


4.0(1) 


3.8(1) 


(b) IN 


2.2(1) 


1.9(1) 


2.1(1) 


2.3(1) 



TABLE IV: Average properties of the MR, GBA, and fitness 
models, compared with the values from the Internet in 1998. 
(ft): average connectivity; (c): average clustering coefficient; 
(d) average chemical distance; (b) average betweenness. Fig- 
ures in parenthesis indicate the statistical uncertainty from 
the average of 1000 realizations of the models. 



of N ~ 4000 nodes, in analogy with the size of the AS 
map analyzed. In each case we perform averages over 
1000 different realizations of the networks. It is worth 
remarking that while the fitness model generates a con- 
nected network, both the GBA and the MR model yield 
disconnected networks. This is due to the rewiring pro- 
cess in the GBA model, while the disconnect nature of 
the graph in the MR model is an inherent consequence 
of the connectivity exponent being larger that 2 jl7| . In 
these two cases we therefore work with graphs whose gi- 
ant component (that is, the largest cluster of connected 
nodes in the network ^7j) has a size of the order N. It is 
important to remind the reader that we are working with 
networks of a relatively small size, chosen so as to fit the 
size of the Internet maps analyzed in the previous Sec- 
tions. In this perspective, all the numerical analysis that 
we shall perform in the following serve only to check the 
validity of the models as representations of the Internet 
as we know it, and do not refer to the intrinsic properties 
of the models in the thermodynamic limit N —> 00. 

As a first check of the connectivity properties of the 
models, in Fig |^ we have plotted the integrated connec- 
tivity distributions. For the MR model we recover the 
expected exponent 7mr — 2.20, since it was imposed in 
the very definition of the model. For the GBA model 
we obtain numerically 7gba — 2.19 for the giant com- 
ponent, in excellent agreement with the value predicted 
by Eq. (|l^) for the asymptotic network. For the fit- 
ness model, on the other hand, a numerical regression of 
the integrated connectivity distribution yields an effec- 
tive exponent 7 n tncss — 2.4. This value is larger than the 
theoretical prediction 2.26 obtained for the model [^6| . 
The discrepancy is mainly due to the logarithmic cor- 
rections present in the connectivity distribution of this 
model. These corrections are more evident in the rela- 
tively small-sized networks used in this work and become 
progressively smaller for larger network sizes. 



In Table IV we report the average values of the con- 
nectivity, clustering coefficient, chemical distance, and 
betweenness for the three models, compared with the re- 
spective values computed for Internet during 1998. From 
the examination of this Table, one could surprisingly con- 
clude that the MR model, which neglects by construc- 
tions any correlation among nodes, yields the average val- 
ues in better agreement with the Internet data. As we can 
observe, the fitness model provides a too small value for 
the average clustering coefficient, while the GBA model 
clearly fails for the average chemical distance and the 
betweenness. A more crucial test about the models is 
however provided by the analysis of the full distribution 
of the various quantities, that should reproduce the scale- 
free features of the real Internet. 

The betweenness distribution pt(b) of the three mod- 
els give qualitatively similar results. The integrated 
betweenness distribution Pb(b) obtained is plotted in 
Fig. ||(a). Both the MR and the fitness models follow 
a power-law decay Pb(b) ~ b~ s with an exponent 6 ~ 2, 
in agreement with the value obtained from the AS maps. 



11 




FIG. 9: a) Integrated betweenness distribution for the MR, GBA, and fitness models, compared with the result from the AS98 
map. The full line on to has slope —1.1, corresponding to the Internet map. b) Betweenness bk as a function of the node's 
connectivity k corresponding to the previous results. The full line has slope 1.0. 



The GBA model shows an appreciable bending which, 
nevertheless, is compatible with the experimental Inter- 
net behavior. These results are in agreement with the 
numerical prediction in Ref. fl30| and support the conjec- 
ture that the exponent S ~ 2.2 is a universal quantity in 
all scale-free networks with 2 < 7 < 3. In order to further 
inspect the betweenness properties, we plot in Fig. ^|(b) 
the average betweenness bk as a function of the connec- 
tivity. In this case, the MR and GBA models yield an 
exponent (3 ~ 1, compatible with the AS maps, while 
the fitness model exhibits a somewhat larger exponent, 
close to 1.4. Also in this case, we have that the finite 
size logarithmic corrections present in the fitness model 
could play a determinant role in this discrepancy. 

While properties related to the betweenness do not ap- 
pear to pinpoint a major difference among the models, 
the most striking test is provided by analyzing the cor- 
relation properties of the models. In Figs. [It] and [ElJ we 
report the average clustering coefficient as a function of 
the connectivity, Ck, and the average connectivity of the 
neighbors, (fc nra ), respectively. The data from Internet 
maps show a nontrivial k structure that, as discussed in 
previous Sections, is due to scale-free correlation proper- 
ties among nodes. These properties depend on their turn 
upon the underlying hierarchy of the Internet structure. 
The only model that renders results in qualitative agree- 
ment with the Internet maps is the fitness model. On 
the contrary, the MR and GBA models completely fail, 
producing quantities which are almost independent on k. 
The reason of this striking difference can be traced back 
to the lack of correlations among nodes, which in the MR 
model is imposed by construction (the model is a ran- 
dom network with fixed connectivity distribution), and 
in the GBA model it is due to the destruction of corre- 
lations by the random rewiring mechanism implemented. 



The general analytic study of connectivity correlations in 
growing networks models can be found in Ref. |?2| , and it 
is worth noticing that a fc-structure in correlation func- 
tions, as probed by the quantity (k nn ), does not arise 
in all growing network models. In this perspective we 
can use correlation properties as one of the discriminat- 
ing feature among various models that show the same 
scale-free connectivity exponent. 

The fitness model is able to reproduce the non trivial 
correlation properties because of the fitness parameter of 
each node that mimics the different hierarchical, econom- 
ical, and geographical constraints of the Internet growth. 
Since the model is embedding many features in one sin- 
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FIG. 10: Clustering coefficient Ck as a function of the connec- 
tivity k for the MR, GBA, and fitness models, compared with 
the result from the AS98 map. The full line has slope —0.75. 
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gle parameter, we have to consider it just as a very first 
step towards a more realistic modeling of the Internet. 
In this perspective, models in which the attachment rate 
depends on both the connectivity and the real space dis- 
tance between two nodes has been studied in |l3|, fTT| . 
These models seem to give a better description of the 
Internet topology. In particular, the model of Ref. [ p"3| |, 
includes geographical constraints, obtaining that, on av- 
erage, the probability to connect to a given node scales 
linearly with its connectivity and it is inversely propor- 
tional to the distance to that node. A comparison with 
real data is, in this case, more difficult because Internet 
maps generally lack geographical and economical infor- 
mation. 



fects that our analysis show to be an appreciable feature 
of the Internet evolution. The results presented in this 
work show that the understanding and modeling of Inter- 
net is an interesting and stimulating problem that need 
the cooperative efforts of data analysis and theoretical 
modeling. 



VIII. SUMMARY AND CONCLUSIONS 



In summary, we have shown that the Internet maps 
exhibit a stationary scale-free topology, characterized by 
non-trivial connectivity correlations. An investigation of 
the Internet dynamics confirms the presence of a prefer- 
ential attachment behaving linearly with the nodes' con- 
nectivity and identifies two different dynamical regimes 
during the nodes' evolution. We have compared several 
models of scale-free networks to the experimental data 
obtained from the AS maps. While all the models seem 
to capture the scale-free connectivity distribution, cor- 
relation and clustering properties are captured only in 
models that take into account several other ingredients, 
such as the nodes' hierarchy, resource constraints and 
geographical location. Other ingredients that should be 
included in the Internet modeling concern the possibility 
of including the wiring among existing nodes and age ef- 
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FIG. 11: Average connectivity of the nearest neighbors of a 
node as a function of the connectivity k for the MR, GBA, 
and fitness models, compared with the result from the AS98 
map. The AS98 data have been binned for the sake of clarity. 
The full line has a slope —0.5. 
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